Conditional Random Fields for Automatic Punctuation
نویسنده
چکیده
We model the relationship between sentences and their punctuation labels using conditional random fields. Some feature functions are hand-designed and others are generated by templates. We train the same model by stochastic gradient ascent, Collins Perceptron and contrastive divergence respectively and compare their performance. On the provided dataset, we achieve word-level accuracy of 94.56%. At last, we propose a heuristic that can deal with cost-sensitive tasks.
منابع مشابه
Automatic Comma Insertion of Lecture Transcripts Based on Multiple Annotations
To enhance readability and usability of speech recognition results, automatic punctuation is an essential process. In this paper, we address automatic comma prediction based on conditional random fields (CRF) using lexical, syntactic and pause information. Since there is large disagreement in comma insertion between humans, we model individual tendencies of punctuation using annotations given b...
متن کاملImproved models for automatic punctuation prediction for spoken and written text
This paper presents improved models for the automatic prediction of punctuation marks in written or spoken text. Various kinds of textual features are combined using Conditional Random Fields. These features include language model scores, token n-grams, sentence length, and syntactic information extracted from parse trees. The resulting models are evaluated on several different tasks, ranging f...
متن کاملA CRF Sequence Labeling Approach to Chinese Punctuation Prediction
This paper presents a conditional random fields based labeling approach to Chinese punctuation prediction. To this end, we first reformulate Chinese punctuation prediction as a multiple-pass labeling task on a sequence of words, and then explore various features from three linguistic levels, namely words, phrase and functional chunks for punctuation prediction under the framework of conditional...
متن کاملGWU-HASP: Hybrid Arabic Spelling and Punctuation Corrector
In this paper, we describe our Hybrid Arabic Spelling and Punctuation Corrector (HASP). HASP was one of the systems participating in the QALB-2014 Shared Task on Arabic Error Correction. The system uses a CRF (Conditional Random Fields) classifier for correcting punctuation errors, an open-source dictionary (or word list) for detecting errors and generating and filtering candidates, an n-gram l...
متن کاملBetter Punctuation Prediction with Dynamic Conditional Random Fields
This paper focuses on the task of inserting punctuation symbols into transcribed conversational speech texts, without relying on prosodic cues. We investigate limitations associated with previous methods, and propose a novel approach based on dynamic conditional random fields. Different from previous work, our proposed approach is designed to jointly perform both sentence boundary and sentence ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014